S. Zhang and R. Sutton discussed the effect of replay buffer size1. After the famous DQN paper2, almost all experiments fixed the buffer size to \(10^6\). Their experiments indicated that larger buffer made learning slower. Combined Experience Replay (CER) is a method where a latest transition is mixed with transitions from replay buffer. CER improves learning speed, especially for large replay buffer.
You can sample
(batch_size - 1) transitions from replay buffer and add the latest transition to the batch.